Document last updated: 2025-04-14



1 Methods

1.1 DNA extraction and amplicon sequencing

We extracted and sequenced DNA from a total of 245 samples, comprised of 5 random stool samples from before the experimental diets began (referred to as Day 0 or Week 0), while all mice were fed the their standard “Control-diet”, and then weekly collections of 5 random stool samples per cohort over the next 12 weeks (5 replicates x 4 cohorts x 12 weeks). DNA extractions were performed using []. Taxonomic profiling was performed by sequencing bacterial 16S rRNA genes. The V3-V4 region of bacterial (and archaeal) 16S rRNA genes was amplified using primers 515f-R806 (Bates et al., 2010). PCR amplifications were performed using previously described methods (Mueller et al., 2016). In the first PCR, sample barcoding was performed with forward and reverse primers each containing a 6-bp barcode; 22 cycles with an annealing temperature of 60oC were performed. The second PCR added Illumina adaptors over 10 cycles with an annealing temperature of 65°C. Amplicon clean-up was performed with a 0.9 ratio of AMPure XP beads (Beckman Coulter, Indianapolis IN), following manufacturer’s instructions and final elutions were performed with 30µl Elution Buffer. Following clean-up, samples were quantified with an Invitrogen Quant-iTTM ds DNA Assay Kit on a BioTek Synergy HI Hybrid Reader and pooled at a concentration of 10 ng per sample. A final clean-up step was performed on pooled samples using a 0.9 ratio of AMPure XP beads. Samples were sequenced on an Illumina MiSeq platform with PE250 chemistry at Los Alamos National Laboratory. Unprocessed sequences are available through NCBI’s Sequence Read Archive ().

1.2 Microbial community sequence analysis

Bacterial sequences were processed using Usearch11 (Edgar, 2010). Samples were demultiplexed, paired ends merged, quality filtered and globally trimmed using a fastq_maxee threshold of 1.0 (Edgar and Flyvbjerg, 2015), dereplicated, and singletons were removed. Chimeras were removed and 97% OTU clustering was performed independently for the two datasets with the -cluster_otus command using the UPARSE-OTU algorithm (Edgar, 2013). Previous analyses have shown congruent ecological patterns with use of OTUs versus exact sequence variants (ESVs) for delineating microbial taxa (92). OTU tables were created using the -otutab command. Bacterial OTUs were classified using the Ribosomal Database Project (RDP) classifier v.19 (Wang et al., 2007). Next-generation sequencing of 16S rRNA genes resulted in 5,043,233 reads (average of 20,585 ± 3,467 (SD) reads per sample, n = 245 samples). These reads yielded 1,090 OTUs. Domain-level analyses revealed that 99.99% of reads were classified as “Bacteria”, 0.003% as “Eukaryota”, and 0.01% were unclassified at the domain level. The dataset was then filtered to exclude all domains except Bacteria, all reads assigned at >= 80% confidence at the phylum-level (n = 12,991), reads assigned to the class Chloroplast (n = 122), and the remaining singleton reads (n = 32). We rarefied via subsampling without replacement to 13,006 sequences per sample to account for uneven sequencing depth and from that, 624 bacterial OTUs (97% sequence similarity) were identified from 243 samples.

Phylum/Class/Order/Family/Genus-level bar plots of the community profiles by week along the 12 week longitudinal can be found in SI Figures XYZ (or here: Taxonomy.

1.3 Statistical analysis

Microbial community analyses were conducted primarily in the vegan (Oksanen et al., 2018) and phyloseq (McMurdie and Holmes, 2013) packages in the R programming environment unless otherwise noted. Patterns in microbial community composition were visualized using non-metric multidimensional scaling (NMDS) using Bray-Curtis (abundance-weighted) and Jaccard (binary presence/absence) distance metrics.

We investigated the degree to which differences in microbial community profiles were explained by experimental factors ….

We also examined differences across initial …

1.3.1 Week-0 baseline characterization

To quantify inter-individual variability prior to dietary intervention, we analyzed five stool samples collected at baseline (Week-0), before diet assignment. Alpha diversity was measured using both observed OTU richness and Shannon entropy. Taxonomic profiles were summarized at the family level using relative abundance from a transformed OTU table. Pairwise Bray–Curtis dissimilarities were computed to assess baseline community divergence. Additionally, we calculated distances to the group centroid using PERMDISP to quantify β-dispersion. These metrics were used to establish the null range of baseline variation, providing an ecological benchmark against which subsequent compositional shifts were interpreted.

1.3.2 Multi-site (sample) dissimilarity partitioning

Using the R package betapart (v1.6), we computed the 3 abundance-based multiple-site dissimilarities (balanced variation fraction,abundance-gradient fraction, and overall dissimilarity) using the Bray-Curtis family of dissimilarity indices, in addition to the corresponding presence/absence-based multiple-site dissimilarities accounting for the spatial turnover and the nestedness components of beta diversity, and the sum of both values using beta.multi(index.family = sorensen”). To compute abundance-based beta diversity, we used unrarefied OTU tables to retain quantitative abundance information. Rarefaction can eliminate natural abundance gradients, resulting in inflated balanced dissimilarity and zero-valued gradient components in Bray-Curtis partitioning (Baselga 2017). We therefore retained untransformed counts for βBRAY partitioning, and only applied rarefied tables for presence/absence-based analyses (e.g., βSOR).

The statistical significance of these explanatory experimental factors was assessed using adonis2(), a function based on permANOVA, within the vegan R-package (McArdle and Anderson, 2001). Adonis is a permutational (n = 999) multivariate analysis of variance test that partitions our Bray-Curtis distance matrices among sources of variation (Anderson, 2001).

The distinctness of the … communities was assessed using the Random Forests classification algorithm (Breiman, 2001), using 1000 trees. As implemented in the R package ‘randomForest’, the algorithm constructs each tree using a different bootstrap sample from the original data (approximately 1/3 of the cases are left out of the bootstrap sample and not used in the construction of the kth tree), thus providing an unbiased estimate of the test set error without the need for separate cross-validation test.

2 Introduction

3 Results


We first used a community ecology framework to assess … Our [analytical aims] were 1) characterize the baseline community variation … 2) assess temporal trajecories within and among diet cohorts 3) build off the previous to narrow in on key HFD-driven, temporally-relevant, perburations in community structure. By partitioning community dissimilarity [… explain betapart rationale] … Following Mori, Isbell, and Seidl (2018), we treat β-diversity not just as a pattern, but as a mechanistic lens to infer how microbial community assembly processes shape functional outcomes under …diets… over time.”

3.1 Research Question 1: What is the magnitude and nature of baseline variation? [A: relatively stable and species-rich?]

Sections goals:

  • Understand the baseline heterogeneity among starting microbiomes before dietary intervention. / Establish baseline expectations [null model for future divergence comparisons.]
  • Find shared and unique OTUs among week 0 samples, compare with community membership across study period
  • Identify range of dissimilarity, dominant mechanisms (turnover vs nestedness).
  • Find any anomalous [pairwise?] outliers.

the baseline compositional heterogeneity (or lack thereof) among samples that began with identical conditions.

This is important because: - It defines how much divergence we expect by chance or stochasticity - allows us to say later, “This diet/timepoint exceeded initial inter-individual variability” - It gives us a framework to talk about assembly (even at T0)

3.1.1 Characterizing Day 0 Communities

  • Add stability of control diet over time

  • consider all control-diet OTUs



Comparing to full dataset/across all weeks

The initial/starting communities (n = 5) contained over half of all of the OTUs detected across the whole dataset (364 of the total 624 OTUs, or 58.3%). Each of the 5 random Week-0 stool samples were comprised of 301 to 311 OTUs.

Families Turicibacteraceae and Rikenellaceae and genera Alistipes, Duncaniella (G-), Limosilactobacillus, and Turicibacter were top 10 for Week-0, but not whole dataset. Conversely, Bacteroidaceae and Bifidobacteriaceae were not in the top 10 families for Week-0, nor were Bifidobacterium, Faecalibaculumm, uncl_Erysipelotrichaceae, or uncl_Oscillospiraceae in the top 10 genera for Week-0.

  • About Duncaniella: - “We performed a large-scale experiment using 579 genetically identical laboratory mice from a single animal facility, designed to identify the causes of disease variability in the widely used dextran sulphate sodium mouse model of inflammatory bowel disease. Commonly used treatment endpoint measures—weight loss and intestinal pathology—showed limited correlation and varied across mouse lineages. Analysis of the gut microbiome, coupled with machine learning and targeted anaerobic culturing, identified and isolated two previously undescribed species, Duncaniella muricolitica and Alistipes okayasuensis, and demonstrated that they exert dominant effects in the dextran sulphate sodium model leading to variable treatment endpoint measures. We show that the identified gut microbial species are common, but not ubiquitous, in mouse facilities around the world, and suggest that researchers monitor for these species to provide experimental design opportunities for improved mouse models of human intestinal diseases.” (Forster et al. (2022))

3.1.1.1 [Dissimilarity Partitioning…]

These communities had an overall abundance-based multiple-site dissimilarity of 0.175 (betapart::beta.multi.abund()) and a presence/absence-based total multiple-site dissimilarity of 0.204 (betapart::beta.multi()). The turnover or species replacement component, measured as the Simpson dissimilarity, represented 93.99% of the total presence/absence-based dissimilarity. Pairwise comparisons of Week-0 communities showed that a minimum 23 of and a maximum of 30 OTUs were not shared between pairs of these initial samples.

The low dissimilarity values of Week-0 samples (abundance-based multiple-site dissimilarity of 0.175 (betapart::beta.multi.abund()) and a presence/absence-based total multiple-site dissimilarity of 0.204) indicates that the baseline communities were relatively homogeneous. Closer examination of these differences revealed that the vast majority, 93.99%, of dissimilarity was attributable to turnover, suggesting that even at baseline, species replacement — not simple richness differences — was the primary mechanism differentiating these microbiomes. This level of heterogeneity at T0 provides a critical context: downstream changes due to diet must exceed this baseline variability to be considered biologically meaningful. This Week-0 landscape acts as a null model against which future community divergence (due to diet and time) can be compared, and should be factored into interpretation of assembly trajectories.

3.1.2 Proportions of Rare vs. Dominant Taxa

Goals here:

1. Show [macro-level] microbial community structure shifts under HFD.

2. Highlight how dominant vs. low-abundance taxa change across groups (e.g., HFD vs. control).

3. Establish whether certain taxonomic tiers (e.g., >10%) become more/less dominant.

4. Lay groundwork for linking those shifts to intestinal barrier defects or downstream disease.



3.1.3 Figure 1 - Characterizing Day 0 Communities

Figure X. Community structure and compositional variability among Week-0 microbiotas. (A) Family-level taxonomic profiles are relatively consistent across five pre-intervention stool samples, with communities dominated by Lactobacillaceae, Muribaculaceae, unclassified Bacteroidales, Lachnospiraceae (all families with a >10% mean relative abundance).(B - old) Observed richness ranged from 301 to 311 OTUs (mean = 304.2), and Shannon entropy ranged from 4.31 to 4.39 (mean = 4.35), indicating modest baseline heterogeneity. (C) Pairwise Bray–Curtis dissimilarities ranged from 0.06 to 0.08 (mean = 0.07), defining the magnitude of inter-individual variation at baseline. (D) Distance to group centroid (mean = 0.044) quantifies beta dispersion under shared, pre-intervention conditions. These data define a reference distribution of compositional variability (/baseline variablility) that contextualizes subsequent changes under dietary exposure.

Figure 3.1: Figure X. Community structure and compositional variability among Week-0 microbiotas. (A) Family-level taxonomic profiles are relatively consistent across five pre-intervention stool samples, with communities dominated by Lactobacillaceae, Muribaculaceae, unclassified Bacteroidales, Lachnospiraceae (all families with a >10% mean relative abundance).(B - old) Observed richness ranged from 301 to 311 OTUs (mean = 304.2), and Shannon entropy ranged from 4.31 to 4.39 (mean = 4.35), indicating modest baseline heterogeneity. (C) Pairwise Bray–Curtis dissimilarities ranged from 0.06 to 0.08 (mean = 0.07), defining the magnitude of inter-individual variation at baseline. (D) Distance to group centroid (mean = 0.044) quantifies beta dispersion under shared, pre-intervention conditions. These data define a reference distribution of compositional variability (/baseline variablility) that contextualizes subsequent changes under dietary exposure.


3.2 Temporal Beta Diversity Decomposition: Within vs. Between Timepoints & Diets (How communities change over time)


Sections goals:

  • Quantify how beta diversity components (Sørensen: turnover/nestedness, Bray: balanced/gradient) evolve within each cohort across time.
  • Show temporal stabilization, dietary divergence, and succession mechanisms.

plots are in SI… Temporal beta diversity trends within cohorts

Add/switch to shared OTUs among baseline/control by week

3.2.1 Within-timepoint (community variation by week x diet)

3.2.1.1 Venn diagrams by diet by week

(still need to clean up legends, spacing, etc)





3.3 Diet-driven change over time



3.3.1 Finding G- taxa that consistently increase in HFD (for potential taxa-specific LPS characterization later)


  • Considering only Control-diet vs. HFD
  • I will filter to only G- taxa (later after confirming gram classification)
  • Enrichment Criteria
    • A taxon is considered HFD-enriched if:
      • (Time-matched contrast): It is more abundant in HFD vs. Control-diet at the same timepoint, and/or
        • For each week W, test if a given OTU is significantly more abundant in HFD vs Control.
        • captures diet effect at each timepoint
      • (Longitudinal contrast): It increases in HFD mice over time relative to Week-0 baseline.
        • For each OTU in HFD samples, test if it increases from Week-0 to later timepoints.
        • captures within-cohort/diet temporal shifts due to HFD
      • Other: a) require a minimal prevalence (e.g. present in ≥25% of samples in a given group/timepoint) to minimize noise; b) minimum mean relative abundance in HFD samples of 0.1% per week
      • We are collecting significant OTUs per timepoint to allow for identification of transient or persistent enrichments
      • Significance assessed via Wilcoxon test per OTU, fdr correction (if OTU present in both groups)
    • We are investigating microbial shifts driven by a high-fat diet (HFD) over time, particularly in taxa (OTUs) that:
      • Emerge or expand in HFD-fed mice over time (even if absent at baseline/week 0) / If an OTU is undetected/absent (zero) in Week-0, but present in HFD at later weeks, we still want to consider it enriched — even if the Wilcoxon test isn’t valid due to lack of variance in one group.
      • Are not present or less abundant in baseline (Week-0)
      • May be transient or persistent, but still relevant to early or chronic HFD effects.
      • Are classified as Gram-negative, supporting hypotheses about endotoxin (LPS) exposure, intestinal permeability, and NAFLD progression.

Prevalence refers to the proportion of HFD samples in which each OTU was detected (>0 abundance).





3.4 Questions in-progress…

3.4.1 When these changes stabilize or diverge



3.4.2 Correlations with host phenotype data

Major indicators of disease state and the relevant time periods

Description of host phenotype markers. Data include host disease state and proxy measurements by diet and timepoint. (These are just my notes — please suggest improvements)

Quantitative

indicators of steatosis/

steatohepatitis/

fibrosis/

cirrhosis

Disease (Proxy) Measurements Control-diet HFD HFD-LA Villin-Cre-HFD
Overview/Notes Generally, increased IP by week 4 and disease state by week 6 No leaky gut phenotype ?
Indicator of barrier dysfunction Serum LPS (in healthy mice: 50-100 pg/mL) - Large molecule, needs defective barrier to cross into bloodstream (via portal vein -> then liver disease) Based on linear standard curve - Big jumps in weeks 3 & 4, then decreases by still high
Luminal LPS?
Dextran 4kd Flux Starts to increase vs control in week 3
Dextran 10kd Flux only week 12
Defective liver function serum ALT
? Percentage of weight gain throughout trial

Alterations in intestinal permeability and related markers

  • findings related to intestinal permeability measurements, including serum (and luminal + fecal?) LPS levels - Generally, week 4 saw increase in IP - Generally, week 6 saw disease state - LA treatments saw no leaky gut

Expression and activity of MLCK

  • MLCK mRNA expression, protein levels, and kinase activity (up-regulated)
  • I don’t have these data…

Activation of inflammatory pathways:

  • enterocyte TLR4/CD14 expression, IκBα phosphorylation, and NF-κB translocation (evidence of immune cell infiltration)
  • I don’t have these data…




4 Additional figures (likely SI)






6 Discussion

locked for now

7 Manuscript Guidelines (ignore)

Cell Host and Microbe

The main figure titles and legends should not be part of the image files, but should instead appear at the end of the main manuscript file.

Please ensure they have clear file names (e.g., Figure 1.tif) and are < 20 MB.

  • TIFF is recommended for bitmap (line art), grayscale, and color images. TIFF supports several good compression schemes, ensuring that file sizes are kept to a minimum to aid easy file transfer. To downsize TIFF files, please use LZW compression.

  • PDF is recommended for any type of figure or image. High-quality PDFs can contain vector graphics as well as pixel-based images in their original formats, preserving your figures as you intend them to be displayed. To downsize PDF files, use the “Reduced Size PDF” option from the File > Save As menu.

Each figure must fit on a single page. We recommend that figures be a maximum of 6.5 x 8 in (16.5 x 20 cm) to allow for page margins and text. When your article is typeset, we will try to include the entire figure caption/legend below each image; if the figure is too large, this may not be possible. Maximum widths are as follows:

  • 8.5 cm (1 column) ➜ 3.34”
  • 11.4 cm (1.5 columns) ➜ 4.48”
  • 17.4 cm (full width of the page) ➜ 6.85”


7.1 Guidelines for artwork preparation

If your paper is accepted for publication, we ask that you consider the following when preparing your final figures for production:

  • Always embed fonts, and use only Arial fonts
  • When using layers, reduce to one layer (flatten artwork) before saving your image
  • Different panels should be labeled with capital letters
  • Text should be about 6–8 pt at the desired print size
  • Figure resolutions should be as follows: for color or grayscale figures, at least 300 dpi; for black and white figures, at least 500 dpi; for line-art figures, at least 1,000 dpi at the desired print size
  • Make sure that any raster artwork within the source document is at the appropriate minimum resolution
  • If used, color should be encoded as RGB
  • Limit vertical space between parts of an illustration to only what is necessary for visual clarity
  • Line weights or stroke widths should be in the 0.5–1.5 pt range
  • Gray fills should be kept at least 20% different from other fills and no lighter than 10% or darker than 80%

7.2 Graphical abstracts

The graphical abstract is a single-panel, square image that is designed to give readers an immediate understanding of the take-home message of the paper. Its intent is to encourage browsing, promote interdisciplinary scholarship, and help readers quickly identify which papers are most relevant to their research interests. Please refer to the Cell Press graphical abstracts guidelines for examples of and comprehensive instructions for graphical abstracts.

References

Forster, Samuel C, Simon Clare, Benjamin S Beresford-Jones, Katherine Harcourt, George Notley, Mark D Stares, Nitin Kumar, et al. 2022. “Identification of Gut Microbial Species Linked with Disease Variability in a Widely Used Mouse Model of Colitis.” Nature Microbiology 7 (4): 590–99. https://doi.org/10.1038/s41564-022-01094-z.
Mori, Akira S., Forest Isbell, and Rupert Seidl. 2018. “Β-Diversity, Community Assembly, and Ecosystem Functioning.” Trends in Ecology & Evolution 33 (7): 549–64. https://doi.org/10.1016/j.tree.2018.04.012.